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Abstract 

We present very efficient active learning algorithms for link classification in signed net- 
works. Our algorithms are motivated by a stochastic model in which edge labels are ob- 
tained through perturbations of a initial sign assignment consistent with a two-clustering of 
the nodes. We provide a theoretical analysis within this model, showing that we can achieve 
an optimal (to whithin a constant factor) number of mistakes on any graph G = (V, E) 
such that \E\ = Q{\V\ 3 / 2 ) by querying Od^l 3 / 2 ) edge labels. More generally, we show 
an algorithm that achieves optimality to within a factor of O(k) by querying at most order 
of \V\ + (\V\/k) 3 / 2 edge labels. The running time of this algorithm is at most of order 
\E\ + \V\ log \V\. 

*This work was supported in part by the PASCAL2 Network of Excellence under EC grant 216886 and by "Dote 
Ricerca", FSE, Regione Lombardia. This publication only reflects the authors' views. 
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1 Introduction 



A rapidly emerging theme in the analysis of networked data is the study of signed networks. From 
a mathematical point of view, signed networks are graphs whose edges carry a sign representing the 
positive or negative nature of the relationship between the incident nodes. For example, in a protein 
network two proteins may interact in an excitatory or inhibitory fashion. The domain of social 
networks and e-commerce offers several examples of signed relationships: Slashdot users can tag 
other users as friends or foes, Epinions users can rate other users positively or negatively, Ebay 
users develop trust and distrust towards sellers in the network. More generally, two individuals 
that are related because they rate similar products in a recommendation website may agree or 
disagree in their ratings. 

The availability of signed networks has stimulated the design of link classification algorithms, 
especially in the domain of social networks. Early studies of signed social networks are from the 
Fifties. E.g., [13] and [1] model dislike and distrust relationships among individuals as (signed) 
weighted edges in a graph. The conceptual underpinning is provided by the theory of social bal- 
ance, formulated as a way to understand the structure of conflicts in a network of individuals whose 
mutual relationships can be classified as friendship or hostility [14]. The advent of online social 
networks has revamped the interest in these theories, and spurred a significant amount of recent 
work — see, e.g., [12, 16, 19, 8, 10, 7], and references therein. 

Many heuristics for link classification in social networks are based on a form of social balance 
summarized by the motto "the enemy of my enemy is my friend". This is equivalent to saying 
that the signs on the edges of a social graph tend to be consistent with some two-clustering of the 
nodes. By consistency we mean the following: The nodes of the graph can be partitioned into two 
sets (the two clusters) in such a way that edges connecting nodes from the same set are positive, 
and edges connecting nodes from different sets are negative. Although two-clustering heuristics 
do not require strict consistency to work, this is admittely a rather strong inductive bias. Despite 
that, social network theorists and practitioners found this to be a reasonable bias in many social 
contexts, and recent experiments with online social networks reported a good predictive power for 
algorithms based on the two-clustering assumption [16, 18, 19, 8]. Finally, this assumption is also 
fairly convenient from the viewpoint of algorithmic design. 

In the case of undirected signed graphs G = (V, E), the best performing heuristics exploit- 
ing the two-clustering bias are based on spectral decompositions of the signed adiacency matrix. 
Noticeably, these heuristics run in time fifll^ 2 ), and often require a similar amount of memory 
storage even on sparse networks, which makes them impractical on large graphs. 

In order to obtain scalable algorithms with formal performance guarantees, we focus on the 
active learning protocol, where training labels are obtained by querying a desired subset of edges. 
Since the allocation of queries can match the graph topology, a wide range of graph-theoretic 
techniques can be applied to the analysis of active learning algorithms. In the recent work [7], a 
simple stochastic model for generating edge labels by perturbing some unknown two-clustering 
of the graph nodes was introduced. For this model, the authors proved that querying the edges 
of a low-stretch spanning tree of the input graph G = (V, E) is sufficient to predict the remain- 
ing edge labels making a number of mistakes within a factor of order (log \ V\) 2 log log \ V\ from 
the theoretical optimum. The overall running time is Od-E 1 ] In |V|). This result leaves two main 
problems open: First, low-stretch trees are a powerful structure, but the algorithm to construct 
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them is not easy to implement. Second, the tree-based analysis of [7] does not generalize to query 
budgets larger than \V\ — 1 (the edge set size of a spanning tree). In this paper we introduce a 
different active learning approach for link classification that can accomodate a large spectrum of 
query budgets. We show that on any graph with f2(|V| 3 / 2 ) edges, a query budget of (9(|y| 3 / 2 ) is 
sufficient to predict the remaining edge labels within a constant factor from the optimum. More 

in general, we show that a budget of at most order of |V| + (^r) 3 ^ 2 queries is sufficient to make 
a number of mistakes within a factor of 0{k) from the optimum with a running time of order 
| i?! + (|V|/fc) log(| V|/A;). Hence, a query budget of 0(|V|), of the same order as the algorithm 
based on low-strech trees, achieves an optimality factor Od^l 1 / 3 ) with a running time of just 

0{\E\). 

At the end of the paper we also report on a preliminary set of experiments on medium-sized 
synthetic and real-world datasets, where a simplified algorithm suggested by our theoretical find- 
ings is compared against the best performing spectral heuristics based on the same inductive bias. 
Our algorithm seems to perform similarly or better than these heuristics. 

2 Preliminaries and notation 

We consider undirected and connected graphs G = (V, E) with unknown edge labeling Y i: j G 
{—1, +1} for each (i, j) G E. Edge labels can collectively be represented by the associated signed 
adjacency matrix Y, where Yjj = whenever g E. In the sequel, the edge-labeled graph G 
will be denoted by (G, Y). 

We define a simple stochastic model for assigning binary labels Y to the edges of G. This 
is used as a basis and motivation for the design of our link classification strategies. As we men- 
tioned in the introduction, a good trade-off between accuracy and efficiency in link classification 
is achieved by assuming that the labeling is well approximated by a two-clustering of the nodes. 
Hence, our stochastic labeling model assumes that edge labels are obtained by perturbing an under- 
lying labeling which is initially consistent with an arbitrary (and unknown) two-clustering. More 
formally, given an undirected and connected graph G = (V, E), the labels G {—1, +1}, for 
G E, are assigned as follows. First, the nodes in V are arbitrarily partitioned into two 
sets, and labels Y i: j are initially assigned consistently with this partition (within-cluster edges are 
positive and between-cluster edges are negative). Note that the consistency is equivalent to the 
following multiplicative rule: For any G E, the label Y^j is equal to the product of signs on 
the edges of any path connecting % to j in G. This is in turn equivalent to say that any simple cycle 
within the graph contains an even number of negative edges. Then, given a nonnegative constant 
p < \, labels are randomly flipped in such a way that P(Yj is flipped) < p for each G E. 
We call this ap-stochastic assignment. Note that this model allows for correlations between flipped 
labels. 

A learning algorithm in the link classification setting receives a training set of signed edges 
and, out of this information, builds a prediction model for the labels of the remaining edges. It is 
quite easy to prove a lower bound on the number of mistakes that any learning algorithm makes in 
this model. 

Fact 1. For any undirected graph G = ( V, E), any training set E C E of edges, and any learning 
algorithm that is given the labels of the edges in E , the number M of mistakes made by A on the 
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remaining E \ E edges satisfies EM > p \ E \ E \, where the expectation is with respect to a 
p-stochastic assignment of the labels Y. 

Proof. Let Y be the following randomized labeling: first, edge labels are set consistently with an 
arbitrary two-clustering of V. Then, a set of 2p\E\ edges is selected uniformly at random and the 
labels of these edges are set randomly (i.e., flipped or not flipped with equal probability). Clearly, 
W(Y i: j is flipped) = p for each G E. Hence this is a p-stochastic assignment of the labels. 
Moreover, E\E contains in expectation 2p\E\E \ randomly labeled edges, on which A makes 
p\E\E \ mistakes in expectation. □ 

In this paper we focus on active learning algorithms. An active learner for link classification 
first constructs a query set E of edges, and then receives the labels of all edges in the query 
set. Based on this training information, the learner builds a prediction model for the labels of the 
remaining edges E\E . We assume that the only labels ever revealed to the learner are those in the 
query set. In particular, no labels are revealed during the prediction phase. It is clear from Fact 1 
that any active learning algorithm that queries the labels of at most a constant fraction of the total 
number of edges will make on average Q(p\E\) mistakes. 

We often write V G and E G to denote, respectively, the node set and the edge set of some 
underlying graph G. For any two nodes i, j G V G , Path(«, j) is any path in G having % and j as 
terminals, and |Path(i, j) | is its length (number of edges). The diameter D G of a graph G is the 
maximum over pairs i, j G V G of the shortest path between i and j. Given a tree T = (Vr, E T ) in 
G, and two nodes i, j G V T , we denote by d T (i, j) the distance of % and j within T, i.e., the length 
of the (unique) path Path r (i, j) connecting the two nodes in T. Moreover, n T {i,j) denotes the 
parity of this path, i.e., the product of edge signs along it. When T is a rooted tree, we denote by 
Children^) the set of children of i in T. Finally, given two disjoint subtrees T',T" C G such 
that Vt> D V T n = 0, we let E G (T , T") = G E G : ieV T >, j G V T „} . 

3 Algorithms and their analysis 

In this section, we introduce and analyze a family of active learning algorithms for link classifi- 
cation. The analysis is carried out under the p-stochastic assumption. As a warm up, we start off 
recalling the connection to the theory of low-stretch spanning trees (e.g., [9]), which turns out to 
be useful in the important special case when the active learner is afforded to query only |V| — 1 
labels. 

Let £fl ip C E denote the (random) subset of edges whose labels have been flipped in a p- 
stochastic assignment, and consider the following class of active learning algorithms parameterized 
by an arbitrary spanning tree T = (Vp, E T ) of G. The algorithms in this class use Eq — Ef as 
query set. The label of any test edge e' = ^ E T is predicted as the parity it T (e!). Clearly 

enough, if a test edge e' is predicted wrongly, then either e' G Ea ip or Path T (e') contains at least 
one flipped edge. Hence, the number of mistakes M T made by our active learner on the set of test 
edges E\E T can be deterministically bounded by 




(1) 



e'GE\E T e&E 
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where !{•} denotes the indicator of the Boolean predicate at argument. A quantity which can be 
related to M T is the average stretch of a spanning tree T which, for our purposes, reduces to 

V\-l + Ee'eE\E T \P^Me')\_ ■ 

A stunning result of [9] shows that every connected, undirected and unweighted graph has a 
spanning tree with an average stretch of just 0(\og 2 \V\ log log |V|). If our active learner uses a 
spanning tree with the same low stretch, then the following result holds. 

Theorem 1 ([7]). Let (G, Y) = ((V, E), Y) be a labeled graph with p-stochastic assigned labels 
Y. If the active learner queries the edges of a spanning tree T = (Vr, Er) with average stretch 
(9(log 2 |\/|loglog|\/|), then E M T < p\E\ x C(log 2 \V\ log log \V\). 

We call the quantity multiplying p \ E\ in the upper bound the optimality factor of the algorithm. 
Recall that Fact 1 implies that this factor cannot be smaller than a constant when the query set size 
is a constant fraction of \E\. 

Although low- stretch trees can be constructed in time C(|-E'| In \V\), the algorithms are fairly 
complicated (we are not aware of available implementations), and the constants hidden in the 
asymptotics can be high. Another disadvantage is that we are forced to use a query set of small 
and fixed size \ V\ — 1. In what follows we introduce algorithms that overcome both limitations. 

A key aspect in the analysis of prediction performance is the ability to select a query set so that 
each test edge creates a short circuit with a training path. This is quantified by J2 eeE l{e G Path T (e') } 
in (1). We make this explicit as follows. Given a test edge (i, j) and a path Path(i, j) whose edges 
are queried edges, we say that we are predicting label Y^ using path Path(i, j) Since (i, j) closes 
Path(i, j) into a circuit, in this case we also say that (i, j) is predicted using the circuit. 

Fact 2. Let (G, Y) = ((V, E), Y) be a labeled graph with p-stochastic assigned labels Y. Given 
query set E C E, the number M of mistakes made when predicting test edges G E\E using 
training paths Path(i, j) whose length is uniformly bounded by £ satisfies EM < £p\E \ E \ . 

Proof. We have the chain of inequalities 

EM< J2 (1 - (1 -p) |Path( ' j)l ) 

(i,j)€E\E 

< £ (i-(i-rt') 

(i,j)eE\E 

< E ( p 

{i,j)€E\E 

< £p\E\E \ . 

□ 

For instance, if the input graph G = (V, E) has diameter D G and the queried edges are those of 
a breadth- first spanning tree, which can be generated in 0( \ E\ ) time, then the above fact holds with 
\Eq\ = \V\ — 1, and £ = 2D G . Comparing to Fact 1 shows that this simple breadth-first strategy 
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is optimal up to constants factors whenever G has a constant diameter. This simple observation 
is especially relevant in the light of the typical graph topologies encountered in practice, whose 
diameters are often small. This argument is at the basis of our experimental comparison — see 
Section 4 . 

Yet, this mistake bound can be vacuous on graph having a larger diameter. Hence, one may 
think of adding to the training spanning tree new edges so as to reduce the length of the circuits 
used for prediction, at the cost of increasing the size of the query set. A similar technique based on 
short circuits has been used in [7], the goal there being to solve the link classification problem in 
a harder adversarial environment. The precise tradeoff between prediction accuracy (as measured 
by the expected number of mistakes) and fraction of queried edges is the main theoretical concern 
of this paper. 

We now introduce an intermediate (and simpler) algorithm, called treeCutter, which im- 
proves on the optimality factor when the diameter D G is not small. In particular, we demonstrate 
that treeCutter achieves a good upper bound on the number of mistakes on any graph such that 
\E\ > 3\V\ + \/W\- This algorithm is especially effective when the input graph is dense, with 
an optimality factor between 0(1) and Moreover, the total time for predicting the test 

edges scales linearly with the number of such edges, i.e., treeCutter predicts edges in constant 
amortized time. Also, the space is linear in the size of the input graph. 

The algorithm (pseudocode given in Figure 1) is parametrized by a positive integer k ranging 
from 2 to |V|. The actual setting of k depends on the graph topology and the desired fraction 
of query set edges, and plays a crucial role in determining the prediction performance. Setting 
k < D G makes treeCutter reduce to querying only the edges of a breadth-first spanning tree of 
G, otherwise it operates in a more involved way by splitting G into smaller node-disjoint subtrees. 

In a preliminary step (Line 1 in Figure 1), treeCutter draws an arbitrary breadth-first span- 
ning tree T = (V T , E T ). Then subroutine extractTreelet(T, k) is used in a do-while loop 
to split T into vertex-disjoint subtrees V whose height is k (one of them might have a smaller 
height). EXTRACTTREELET(T, k) is a very simple procedure that performs a depth-first visit of 
the tree T at argument. During this visit, each internal node may be visited several times (during 
backtracking steps). We assign each node i a tag hr(i) representing the height of the subtree of T 
rooted at i. h T {i) can be recursively computed during the visit. After this assignment, if we have 
h T {i) = k (or % is the root of T) we return the subtree Tj of T rooted at %. Then treeCutter 
removes (Line 6) T« from T along with all edges of E T which are incident to nodes of Tj, and then 
iterates until Vp gets empty. By construction, the diameter of the generated subtrees will not be 
larger than 2k. Let T denote the set of these subtrees. For each T' G T, the algorithm queries 
all the labels of E T >, each edge G E G \ E T > such that i, j G V T > is set to be a test edge, and 
label Yij is predicted using Path T /(i, j) (note that this coincides with Path T /(i, j), since V C T), 
that is, = ir T (i,j). Finally, for each pair of distinct subtrees T", T" G T such that there exists 
a node of Vt> adjacent to a node of Vt", i.e., such that Eg(T',T") is not empty, we query the 
label of an arbitrarily selected edge (i 1 , i") G Eq(T', T") (Lines 8 and 9 in Figure 1). Each edge 
(u, v) G E G (T', T") whose label has not been previously queried is then part of the test set, and 
its label will be predicted as Y UjV <- ttt(u, i') ■ Y V y> ■ ir T (i", v) (Line 11). That is, using the path 
obtained by concatenating Patri T '(w, i') to edge (i\ i") to Path T /(i", v). 
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treeCutter(/c) Parameter: k > 2 
Initialization: T ^— 0. 

1 . Draw an arbitrary breadth-first spanning tree T of G 

2. Do 

3 . T i- extractTreelet(T, k), and query all labels in E T > 

4. T^TU{T'} 

5 . For each i,j e V^/, set predict Yjj <— ir T (i,j) 

6 . T <- T \ T' 

7 . While <y T ^ 0) 

8 . For each T", T" eT:V ^ T" 

9 . If £ G (T', T") ^ query the label of an arbitrary edge (i', i") G S G (T', T") 

10. For each (u, u) e E G (T, T") \ {(i', i")}, with i', u e V T > and v, i" e V^" 

1 1 . predict <— n T i(u, i') ■ Yi> ti » ■ Tr T »(i", v) 



Figure 1: treeCutter pseudocode. 



extractTreelet(T, k) Parameters: tree T,k>2. 

1 . Perform a depth-first visit of T starting from the root. 

2 . During the visit 

3 . For each i eV T visited for the 1 1 + Children T (i) |-th time (i.e., the last visit of i) 

4 . If i is a leaf set h T (i) <— 

5 . Else set hr(i) 1 + max{/ir(j) : j £ Children'r(i)} 

6 . If h T (i) = k or i = T's root return subtree rooted at i 



Figure 2: EXTRACTTREELET pseudocode. 



The following theorem 1 quantifies the number of mistakes made by treeCutter. The re- 

|y|2 it/1 1^1 

quirement on the graph density in the statement, i.e., \V\ — 1 + ^ + < V implies that the 
test set is not larger than the query set. This is a plausible assumption in active learning scenarios, 
and a way of adding meaning to the bounds. 

Theorem 2. For any integer k > 2, the number M of mistakes made by treeCutter on any 
graph G(V,E) with \E\ > 2\V\ - 2 + ^ + ^ satisfies EM < min{4£; + l,2D G }p\E\, while 
the query set size is bounded by \ V\ — 1 + 7^5- + ^ < ~y- 

3.1 Refinements 

We now refine the simple argument leading to treeCutter, and present our active link classifier. 
The pseudocode of our refined algorithm, called STArMaker, follows that of Figure 1 with the 
following differences: Line 1 is dropped (i.e., STArMaker does not draw an initial spanning 
tree), and the call to extractTreelet in Line 3 is replaced by a call to extractStar. This 

'Due to space limitations long proofs are presented in the supplementary material. 
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new subroutine just selects the star T' centered on the node of G having largest degree, and queries 
all labels of the edges in E T >. The next result shows that this algorithm gets a constant optimality 
factor while using a query set of size (9(|\/| 3 / 2 ). 

Theorem 3. The number M of mistakes made by STArMaker on any given graph G(V, E) with 
\E\ > 2\V\ — 2 + 2| V| 2 satisfies EM < 5p\E\, while the query set size is upper bounded by 
\V\-1 + \V\* < J f. 

Finally, we combine starMaker with treeC utter so as to obtain an algorithm, called 
treeletStar, that can work with query sets smaller than \V\ — 1 + \V\ 2 labels. treeletStar 
is parameterized by an integer k and follows Lines 1-6 of Figure 1 creating a set T of trees through 
repeated calls to extractTreelet. Lines 7-1 1 are instead replaced by the following procedure: 
a graph G' = (V G >, Eq>) is created such that: (1) each node in V G ' corresponds to a tree in T, (2) 
there exists an edge in Eqi if and only if the two corresponding trees of T are connected by at 
least one edge of Eq. Then, extractStar is used to generate a set S of stars of vertices of G' , 
i.e., stars of trees of T. Finally, for each pair of distinct stars S', S" € S connected by at least one 
edge in E G , the label of an arbitrary edge in E G (S', S") is queried. The remaining edges are all 
predicted. 

Theorem 4. For any integer k > 2 and for any graph G = (V,E) with \E\ > 2\V\ — 2 + 
+ l) 5 ' number M of mistakes made by treeletStar(A;) on G satisfies EM = 
(9(min{A;, D G }) p\E\, while the query set size is bounded by \V \ — 1 + ( ^jf 1 + l) 2 < ^p. 

Hence, even if D G is large, setting k = IV] 1 ^ 3 yields a OdV^I 1 / 3 ) optimality factor just by 
querying (9(| V|) edges. On the other hand, a truly constant optimality factor is obtained by query- 
ing as few as 0(|\/| 3 / 2 ) edges (provided the graph has sufficiently many edges). As a direct 
consequence (and surprisingly enough), on graphs which are only moderately dense we need not 
observe too many edges in order to achieve a constant optimality factor. It is instructive to compare 
the bounds obtained by treeletStar to the ones we can achieve by using the cccc algorithm 
of [7], or the low-stretch spanning trees given in Theorem 1. 

Because CCCC operates within a harder adversarial setting, it is easy to show that Theorem 9 in 
[7] extends to the p-stochastic assignment model by replacing A 2 (Y) with p\E\ therein. 2 The re- 

3 

suiting optimality factor is of order (^^) 2 vWT> where a G (0, 1] is the fraction of queried edges 
out of the total number of edges. A quick comparison to Theorem 4 reveals that treeletStar 
achieves a sharper mistake bound for any value of a. For instance, in order to obtain an optimality 
factor which is lower than ^/| V|, CCCC has to query in the worst case a fraction of edges that goes 
to one as I V| — > 00. On top of this, our algorithms are faster and easier to implement — see Section 
3.2. 

Next, we compare to query sets produced by low-stretch spanning trees. A low-stretch spanning 
tree achieves a poly logarithmic optimality factor by querying |V| — 1 edge labels. The results in [9] 
show that we cannot hope to get a better optimality factor using a single low-stretch spanning tree 
combined by the analysis in (1). For a comparable amount 0(|V|) of queried labels, Theorem 

2 This theoretical comparison is admittedly unfair, as CCCC has been designed to work in a harder setting than 
p-stochastic. Unfortunately, we are not aware of any other general active learning scheme for link classification to 
compare with. 
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4 offers the larger optimality factor | V" | - 1 / 3 . However, we can get a constant optimality factor by 
increasing the query set size to 0(\V\ 3 ^ 2 ). It is not clear how multiple low- stretch trees could be 
combined to get a similar scaling. 

3.2 Complexity analysis and implementation 

We now compute bounds on time and space requirements for our three algorithms. Recall the 
different lower bound conditions on the graph density that must hold to ensure that the query set 
size is not larger than the test set size. These were \E\ > 2\V\ — 2 + + ^ for treeCutter(/c) 
in Theorem 2, \E\ > 2\V\ - 2 + 2|V|§ for STARMAKER in Theorem 3, and \E\ > 2\V\ - 2 + 

3 

2 (^~^ + 2 for treeletStar(A;) in Theorem 4. 

Theorem 5. For any input graph G = (V, E) which is dense enough to ensure that the query set 
size is no larger than the test set size, the total time needed for predicting all test labels is: 

0(\E\) for treeCutter(/c) and for all k 

0(\E\ + \V\\og\V\) for starMaker 

/ W\ \V\\ 
O ( \E\ + '—^ log 1 for TREELETSTAR(A;) and for all k. 

In particular, whenever k\E\ = £l(\V\ log |V|) we have that TREELETStar(A;) works in constant 
amortized time. For all three algorithms, the space required is always linear in the input graph 
size \E\. 



4 Experiments 

In this preliminary set of experiments we only tested the predictive performance of treeCutter( | V\). 
This corresponds to querying only the edges of the initial spanning tree T and predicting all re- 
maining edges (i, j) via the parity of Path-r(«, j). The spanning tree T used by TREECUTTER is a 
shortest-path spanning tree generated by a breadth-first visit of the graph (assuming all edges have 
unit length). As the choice of the starting node in the visit is arbitrary, we picked the highest degree 
node in the graph. Finally, we run through the adiacency list of each node in random order, which 
we empirically observed to improve performance. 

Our baseline is the heuristic ASymExp from [16] which, among the many spectral heuris- 
tics proposed there, turned out to perform best on all our datasets. With integer parameter z, 
ASymExp (z) predicts using a spectral transformation of the training sign matrix Ftram, whose 
only non-zero entries are the signs of the training edges. The label of edge is predicted using 
(exp(y train (z))) . .. Here exp(y train (2;)) = U z exp(D z )Uj, where U z D z Uj is the spectral decom- 
position of Itrain containing only the z largest eigenvalues and their corresponding eigenvectors. 
Following [16], we ran ASymExp(,2) with the values z — 1, 5, 10, 15. This heuristic uses the two- 
clustering bias as follows : expand exp(Y t rain) in a series of powers Y^ &in . Then each (^raiiJij ^ s 
a sum of values of paths of length n between % and j. Each path has value if it contains at least 
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one test edge, otherwise its value equals the product of queried labels on the path edges. Hence, 
the sign of exp(F train ) is the sign of a linear combination of path values, each corresponding to a 
prediction consistent with the two-clustering bias — compare this to the multiplicative rule used by 
treeCutter. Note that ASymExp and the other spectral heuristics from [16] have all running 
times of order VI ( | V | 2 ) . 

We performed a first set of experiments on synthetic signed graphs created from a subset of the 
USPS digit recognition dataset. We randomly selected 500 examples labeled "1" and 500 examples 
labeled "7" (these two classes are not straightforward to tell apart). Then, we created a graph using 
a &;-NN rule with k = 100. The edges were labeled as follows: all edges incident to nodes with 
the same USPS label were labeled +1; all edges incident to nodes with different USPS labels were 
labeled —1. Finally, we randomly pruned the positive edges so to achieve an unbalance of about 
20% between the two classes. 3 Starting from this edge label assignment, which is consistent with 
the two-clustering associated with the USPS labels, we generated a p-stochastic label assignment 
by flipping the labels of a random subset of the edges. Specifically, we used the three following 
synthetic datasets: 

DELTA0: No flippings (p = 0), 1,000 nodes and 9,138 edges; 

DELTA100: 100 randomly chosen labels of DELTA0 are flipped; 

DELTA250: 250 randomly chosen labels of DELTA0 are flipped. 

We also used three real-world datasets: 

MOVIELENS: A signed graph we created using Movielens ratings. 4 We first normalized the 
ratings by subtracting from each user rating the average rating of that user. Then, we created a 
user-user matrix of cosine distance similarities. This matrix was sparsified by zeroing each entry 
smaller than 0.1 and removing all self-loops. Finally, we took the sign of each non-zero entry. The 
resulting graph has 6,040 nodes and 824,818 edges (12.6% of which are negative). 

SLASHDOT: The biggest strongly connected component of a snapshot of the Slashdot social 
network, 5 similar to the one used in [16]. This graph has 26,996 nodes and 290,509 edges (24.7% 
of which are negative). 

EPINIONS: The biggest strongly connected component of a snapshot of the Epinions signed 
network, 6 similar to the one used in [18, 17]. This graph has 41,441 nodes and 565,900 edges 
(26.2% of which are negative). 

Slashdot and Epinions are originally directed graphs. We removed the reciprocal edges with 
mismatching labels (which turned out to be only a few), and considered the remaining edges as 
undirected. 

The following table summarizes the key statistics of each dataset: Neg. is the fraction of neg- 
ative edges, IVI/I^I is the fraction of edges queried by treeCutter(|\/|), and Avgdeg is the 
average degree of the nodes of the network. 

3 This is similar to the class unbalance of real-world signed networks — see below. 

4 www . group lens .org/ system/ f iles/ml-lm . zip. 

5 snap . stanf ord . edu/ data/ soc-sign-Slashdot08 110 6 .html. 

6 snap . Stanford .edu/ data/ soc-sign-epinions . html. 
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Figure 3: F-measure against training set size for TREECUTTER(| V|) and ASymExp(^) with different values of z on both synthetic and real-world 
datasets. By construction, TREECUTTER never makes a mistake when the labeling is consistent with a two-clustering. So on DELTAO TREECUTTER 
does not make mistakes whenever the training set contains at least one spanning tree. With the exception of EPINIONS, TREECUTTER outperforms 
ASymExp using a much smaller training set. We conjecture that ASymExp responds to the bias not as well as TREECUTTER, which on the other 
hand is less robust than ASymExp to bias violations (supposedly, the labeling of EPINIONS). 
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Our results are summarized in Figure 3, where we plot F-measure (preferable to accuracy due 
to the class unbalance) against the fraction of training (or query) set size. On all datasets, but 
MOVIELENS, the training set size for ASymExp ranges across the values 5%, 10%, 25%, and 
50%. Since MOVIELENS has a higher density, we decided to reduce those fractions to 1%, 3%, 
5% and 10%. treeCutter(|V|) uses a single spanning tree, and thus we only have a single 
query set size value. All results are averaged over ten runs of the algorithms. The randomness in 
ASymExp is due to the random draw of the training set. The randomness in treeCutter(|K|) is 
caused by the randomized breadth-first visit. 



5 Conclusions and work in progress 

We have built on the recent work [7], so as to generalize the results contained therein to query 
budgets larger than \V\ — 1 (the edge set size of a spanning tree). We also provided algorithms 
which are easier to implement than low- stretch spanning trees. A research avenue we are currently 
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exploring is whether we can combine the edge information with information possibly contained in 
the nodes of a network. The suite of papers [2, 4, 5, 3, 6] is a good starting for this investigation. 
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6 Appendix with missing proofs 

Proof of Theorem 2. By Fact 2, it suffices to show that the length of each path used for predicting 
the test edges is bounded by Ak + 1. For each V e T, we have D T > < 2k, since the height of each 
subree is not bigger than k. Hence, any test edge incident to vertices of the same subtree V G T 
is predicted (Line 5 in Figure 1) using a path whose length is bounded by 2k < Ak + 1. Any test 
edge (u, v) incident to vertices belonging to two different subtrees T', T" e T is predicted (Line 
11 in Figure 1) using a path whose length is bounded by D T > + D T » + 1 < 2k + 2k + 1 = Ak + 1, 
where the extra +1 is due to the query edge (i', i") connecting T' to T" (Line 9 in Figure 1). 

In order to prove that \ V\ — 1 + ^ + is an upper bound on the query set size, observe that 
each query edge either belongs to T or connects a pair of distinct subtrees contained in 7~. The 
number of edges in T is \ V\ — 1, and the number of the remaining query edges is bounded by the 
number of distinct pairs of subtrees contained in |T|, which can be calculated as follows. First of 
all, note that only the last subtree returned by EXTRACtTreelet may have a height smaller than 
k, all the others must have height k. Note also that each subtree of height k must contain at least 
k + 1 vertices of V T , while the subtree of T having height smaller than k (if present) must contain 
at least one vertex. Hence, the number of distinct pairs of subtrees contained in 7~ can be upper 
bounded by 

in(in-i) i( \v\-i ivf \v\ 

2 -2\k + l J\k + lJ-k 2 k' 
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\v\ 2 \v\ 

This shows that the query set size cannot be larger than \ V\ — 1 + ^ + 

Finally, observe that Dt < 2Dq because of the breadth-first visit generating T. If Dt < k, 
the subroutine extractTreelet is invoked only once, and the algorithm does not ask for any 
additional label of E G \ E T (the query set size equals \V\ — 1). In this case EM is clearly upper 
bounded by 2D G p \ E \ . □ 

Proof of Theorem 3. In order to prove the claimed mistake bound, it suffices to show that each test 
edge is predicted with a path whose length is at most 5. This is easily seen by the fact that summing 
the diameter of two stars plus the query edge (i 1 , i") that connects them is equal to 2 + 2 + 1 = 5, 
which is therefore the diameter of the tree made up by two stars connected by the additional query 
edge. 

We continue by bounding from the above the query set size. Let Sj be the j-th star returned by 
the j-th call to EXTRACTSTAR. The overall number of query edges can be bounded by \V\ — 1 + z, 
where |V| — 1 serves as an upper bound on the number of edges forming all the stars output by 
EXTRACtStar, and z is the sum over j = 1, 2, ... of the number of stars Sj> with j' > j (i.e., j' 
is created later than j) connected to Sj by at least one edge. 

Now, for any given j, the number of stars Sj> with j' > j connected to Sj by at least one edge 
cannot be larger that min{|K|, |VsJ 2 }. To see this, note that if there were a leaf q of Sj connected 
to more than | | — 1 vertices not previously included in any star, then EXTRACtStar would have 
returned a star centered in q instead. The repeated execution of EXTRACTSTAR can indeed be seen 
as partitioning V. Let V be the set of all partitions of V. With this notation in hand, we can bound 
z as follows: 

1^1 

z < max^min{^ 2 (P), \V\} (2) 
PeV 3=1 

where Zj(P) is the number of nodes contained in the the j-th element of the partition P, corre- 
sponding to the number of nodes in Sj. Since J]j=i z j{P) = | | for any P e V, it is easy to 
see that the partition P* maximizing the above expression is such that Zj(P*) = y/\V\ for all j, 
implying \P*\ = \/\V~\. We conclude that the query set size is bounded by \V\ — 1 + \V\%, as 
claimed. □ 

Proof of Theorem 4. If the height of T is not larger than k, then extractTreelet is invoked 
only once and T contains the single tree T. The statement then trivially follows from the fact that 
the length of the longest path in T cannot be larger than twice the diameter of G. Observe that in 
this case \ Vg>\ = 1. 

We continue with the case when the height of T is larger than k. We have that the length of 
each path used in the prediction phase is bounded by 1 plus the sum of the diameters of two trees 
of T. Since these two trees are not higher than k, the mistake bound follows from Fact 2. 

Finally, we combine the upper bound on the query set size in the statement of Theorem 3 with 
the fact that each vertex of Vg> corresponds to a tree of T containing at least k + 1 vertices of G. 
This implies \Vg>\ < j^, and the claim on the query set size of treeletStar follows. □ 

Proof of Theorem 5. A common tool shared by all three implementations is a preprocessing step. 

Given a subtree T' of the input graph G we preliminarily perform a visit of all its vertices (e.g., 
a depth-first visit) tagging each node by a binary label yi as follows. We start off from an arbitrary 
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node i e Vt>, and tag it yi = +1. Then, each adjacent vertex j in V is tagged by yj = yi • Y^. The 
key observation is that, after all nodes in V have been labeled this way, for any pair of vertices 
u, v e Vt' we have nr'ihj) — Hi ' Uj, i- e -> we can easily compute the parity of Ped,h T /(u, v) in 
constant time. The total time taken for labeling all vertices in V T > is therefore 0(\ V T >\). 

With the above fast tagging tool in hand, we are ready to sketch the implementation details of 
the three algorithms. 

Part 1. We draw the spanning tree T of G and tag as described above all its vertices in time 
0(|V|). We can execute the first 6 lines of the pseudocode in Figure 5 in time as follows. 

For each subtree T, L C T rooted at % returned by extractTreelet, we assign to each of its nodes 
a pointer to its root i. This way, given any pair of vertices, we can now determine whether they 
belong to same subtree in constant time. We also mark node % and all the leaves of each subtree. 
This operation is useful when visiting each subtree starting from its root. Then the set T contains 
just the roots of all the subtree returned by EXTRACTTREELET. This takes 0(\ V T \) time. For each 
T'GTwe also mark each edge in E T > so as to determine in constant time whether or not it is part 
of T'. We visit the nodes of each subtree T' whose root is in T, and for any edge (i, j) connecting 
two vertices of T', we predict in constant time Y^j by yi ■ yj . It is then easy to see that the total time 
it takes to compute these predictions on all subtrees returned by extractTreelet is 0(\E\). 

To finish up the rest, we allocate a vector v of \V\ records, each record v,i storing only one edge 
in E G and its label. For each vertex r e T we repeat the following steps. We visit the subtree T' 
rooted at r. For brevity, denote by root(i) the root of the subtree which i belongs to. For any edge 
connecting the currently visited node % to a node j V T i, we perform the following operations: 
if ^root(j) is empty, we query the label Y iy j and insert edge together with Y i:j in f r0 ot(j)- If 
instead v roo t(j) is not empty, we set (i, j) to be part of the test set and predict its label as 

Y itj <- ir T (i, z) ■ Y z/jZ „ ■ TT T (z",j) = yi ■ y z > ■ Y z , tZ „ ■ y z » ■ y h 

where (z', z") is the edge contained in f r0 ot(j)- We mark each predicted edge so as to avoid to 
predict its label twice. We finally dispose the content of vector v. 

The execution of all these operations takes time overall linear in \E\, thereby concluding the 
proof of Part 1 . 

Part 2. We rely on the notation just introduced. We exploit an additional data structure, which 
takes extra V|) space. This is a heap H whose records hi contain references to vertices i e V. 
Furthermore, we also create a link connecting % to record hi. The priority key ruling heap H is the 
degree of each vertex referred to by its records. With this data structure in hand, we are able to find 
the vertex having the highest degree (i.e., the top element of the heap) in constant time. The heap 
also allows us to execute in logarithmic time a pop operation, which eliminates the top element 
from the heap. 

In order to mimic the execution of the algorithm, we perform the following operations. We 
create a star S centered at the vertex referred to by the top element of H connecting it with all the 
adjacent vertices in G. We mark as "not-in-use" each leaf of S. Finally, we eliminate the element 
pointing to the center of S from H (via a pop operation) and create a pointer from each leaf of S 
to its central vertex. We keep creating such star graphs until H becomes empty. Compared to the 
creation of the first star, all subsequent stars essentially require the same sequence of operations. 
The only difference with the former is that when the top element of H is marked as not-in-use, 
we simply pop it away. This is because any new star that we create is centered at a node that is 
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not part of any previously generated star. The time it takes to perform the above operations is 
0{\V\\og\V\). 

Once we have created all the stars, we predict all the test edges the very same way as we 
described for treeCutter (labeling the vertices of each star, using a set T containing all the star 
centers and the vector v for computing the predictions). Since for each edge we perform only a 
constant number of operations, the proof of Part 2 is concluded. 

Part 3. treeletStar(Ic) can be implemented by combining the implementation of tree- 
Cutter with the implementation of starMaker. In a first phase, the algorithm works as tree- 
Cutter, creating a set T containing the roots of all the subtrees with diameter bounded by k. 
We label all the vertices of each subtree and create a pointer from each node % to root (2). Then, 
we visit all these subtrees and create a graph G' = (V, E') having the following properties: V 
coincides with T, and there exists an edge (i, j) G E' if and only if there exists at least one edge 
connecting the subtree rooted at i to the subtree rooted at j. We also use two vectors u and u', 
both having \ V\ components, mapping each vertex in V to a vertex in V, and viceversa. Using H 
on G' , the algorithm splits the whole set of subtrees into stars of subtrees. The root of the subtree 
which is the center of each star is stored in a set S C T. In addition to these operations, we create 
a pointer from each vertex of S to r. For each r E S, the algorithm predicts the labels of all 
edges connecting pairs of vertices belonging to S using a vector v as for treeCutter. Then, it 
performs a visit of S for the purpose of relabeling all its vertices according to the query set edges 
that connect the subtree in the center of S with all its other subtrees. Finally, for each vertex of S, 
we use vector v as in treeCutter and starMaker for selecting the query set edges connecting 
the stars of subtrees so created and for predicting all the remaining test edges. 

Now, G' is a graph that can be created in time. The time it takes for operating with H 

on G' is 0{\ V | log | V |) = log the equality deriving from the fact that each subtree 

with diameter equal to k contains at least k + 1 vertices, thereby making \V'\ < Since the 
remaining operations need constant time per edge in E, this concludes the proof. □ 
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